Microarray Expression Data Analysis: Effect of Methylglyoxal on Transcriptome Changes in Visceral Adipose Tissue

logo

Introduction

Methylglyoxal (MG), a highly reactive dicarbonyl compound, is inevitably formed as a by-product of glycolysis. Methylglyoxal is a major cell-permeant precursor of advanced glycation end-products (AGEs), which are associated with several pathologies including diabetes, aging and neurodegenerative diseases. [Source]

Methylglyoxal: formation, degradation, and cellular effects [source]

Excessive dicarbonyl generation leads to AGE production, activates inflammatory processes, increases oxidative stress and impairs glucose tolerance, all of which cause metabolic and vascular changes in different tissues. MG is recognized as a trigger for the development and progression of diabetic complications. MG accumulation in adipocytes after exposure to MG causes structural and functional changes in adipose tissue independently of obesity; these alterations may precede the onset of metabolic syndrome and type 2 diabetes (T2D). Indirectly, MG can disturb and impair different signaling pathways and also trigger epigenetic changes . However, the exact mechanism underlying the pathophysiological role of MG and glyoxalase-1 in adipose tissue is not yet fully understood. [Source]

Introduction

Microarray Affymetrix data for the experiment were obtained here:[Source]

The main purpose of this experiment was to investigate the effect of methylglyoxal on transcriptome and metabolic changes in visceral adipose tissue using a prediabetic rat. All experimental procedures were carried out using 13 rats (6 control rats and 7 MG-treated rats). In the MG-treated group of rats, MG was administered intragastrically three times a week at a dose of 0.5 mg/kg bodyweight for four weeks. In the control group, water was administered intragastrically over the same period.[Source]

Libraries

library(here)
library(conflicted)
library(emo)
library(tidyverse)
library(glue)
library(oligo)
library(limma)
library(ReportingTools)
library(lattice)
library(sva)
library(ragene21sttranscriptcluster.db)
source(here("age_library.R"))
library(DESeq2)
conflict_prefer("rowRanges", "MatrixGenerics")
library(magrittr)
library(stringr)
library(glue)
library(ggplot2)
library(goseq)
library(clusterProfiler)
library("biomaRt")

Input file

DATA_DIR <- here("madata")
EXPERIMENT_DATA_DIR <- here(DATA_DIR, "E-MTAB-9013.sdrf.txt")

Data preprocessing

Now, we’ll read the data:

data <- readr::read_delim(EXPERIMENT_DATA_DIR, delim = "\t", progress = FALSE) %>%dplyr::select(1, 27, 25) %>%magrittr::set_colnames(c("sample_name", "sample_type", "cel_file"))
data[data == "methyl glyoxal"] <- "methylglyoxal"
data[data == "none"] <- "control"
data

We’ll remodel our dataframe:

data <- as.data.frame(data) %>%magrittr::set_rownames(.$sample_name)
data

Load the data from the madata file:

cel_files <- glue("madata/{data$cel_file}")
raw_data <- read.celfiles(cel_files)
## Reading in : madata/CTL1.cel
## Reading in : madata/CTL2.cel
## Reading in : madata/CTL3.cel
## Reading in : madata/CTL4.cel
## Reading in : madata/CTL5.cel
## Reading in : madata/CTL6.cel
## Reading in : madata/EXP1.cel
## Reading in : madata/EXP2.cel
## Reading in : madata/EXP3.cel
## Reading in : madata/EXP4.cel
## Reading in : madata/EXP5.cel
## Reading in : madata/EXP6.cel
## Reading in : madata/EXP7.cel

Let’s prepare metadata for AnnotatedDataFrame:

sampleNames(raw_data) <- data$sample_name
metadata <- data.frame(labelName = colnames(data),labelDescription = c("", "Used drug.", "")
)
metadata
(data <- AnnotatedDataFrame(data = data, varMetadata = metadata))
## An object of class 'AnnotatedDataFrame'
##   rowNames: CTL1 CTL2 ... EXP7 (13 total)
##   varLabels: sample_name sample_type cel_file
##   varMetadata: labelName labelDescription

We’ve created AnnotatedDataFrame:

phenoData(raw_data) <- Biobase::combine(phenoData(raw_data), data)
phenoData(raw_data)
## An object of class 'AnnotatedDataFrame'
##   rowNames: CTL1 CTL2 ... EXP7 (13 total)
##   varLabels: index sample_name sample_type cel_file
##   varMetadata: labelDescription channel labelName

Technical quality control

We’ll now look at the technical quality control and decide if the experiment was carried out properly.

Pseudo-image

We check if the image contains something like “random noise”.

image(raw_data,which = 10, transfo = log2)

Image seems to be fine.

MA plot

Now, we’ll create MA plot for three samples:

MAplot(raw_data[, 1:3], pairs = TRUE)

Boxplots

We’ll plot the boxplots to see whether they are similiar or not:

boxplot(raw_data, target = "core")

These boxplots seem to be very similiar.

Probe level model

Another technical quality control step consists of fitting probe level model using a robust regression.

fit_plm <- fitProbeLevelModel(raw_data)
## Background correcting... OK
## Normalizing... OK
## Summarizing... OK
## Extracting...
##   Estimates... OK
##   StdErrors... OK
##   Weights..... OK
##   Residuals... OK
##   Scale....... OK
image(fit_plm, which = 10)

image(fit_plm, which = 10, type = "sign.residuals")

Fortunately, we see a random noise.

Relative log expression (RLE)

Boxplots should be located around 0.

RLE(fit_plm)

Luckily, we see that boxplots are located around value 0.

Normalized unscaled standard errors (NUSE)

Boxplots should be located around value 1.

NUSE(fit_plm)

We can see that boxplots are located around value 1.

Normalization and probe annotation

We normalize the data:

norm_data <- rma(raw_data)
## Background correcting
## Normalizing
## Calculating Expression

Next step is to annotate the probe IDs to gene IDs. We’ll look at the featureNames of our norm_data as we use them as the keys for the annotation.

head(featureNames(norm_data)) 
## [1] "17600001" "17600003" "17600005" "17600007" "17600009" "17600011"

We’ll look at the database corresponding to our chip: RaGene-2_1-st, which is containing the right rat geneIDs to our probe IDs.

AnnotationDbi::head(keys(ragene21sttranscriptcluster.db))
## [1] "17600001" "17600003" "17600005" "17600007" "17600009" "17600011"

Now, we’ll load the database and annotate the probeIDs to geneIDs.

feature_data <- AnnotationDbi::select(ragene21sttranscriptcluster.db, columns = c("PROBEID", "ENSEMBL", "SYMBOL", "GENENAME", "ENTREZID"), keys = featureNames(norm_data), keytype = "PROBEID")
feature_data
table(is.na(feature_data$ENSEMBL))
## 
## FALSE  TRUE 
## 20562 17563

Unfortunately, as you can see we managed to annotate something around 50% of our rat genes.

So, we’ll look at another solution and try database from Biomart. Here, we annotate the probes from Biomart:

library("biomaRt")
ensembl = useMart(biomart= "ensembl",dataset= "rnorvegicus_gene_ensembl")
affy_ensembl= c("affy_ragene_2_1_st_v1", "ensembl_gene_id", "uniprot_gn_symbol", "external_gene_name", "entrezgene_trans_name")
ragenedf <- getBM(attributes= affy_ensembl, mart= ensembl, values = "*", uniqueRows=T)
colnames(ragenedf) <- c('PROBEID','ENSEMBL','SYMBOL', 'GENENAME','ENTREZID')
ragenedf%>% arrange(PROBEID)%>% head()

Now, we merge the results from database: ragene21sttranscriptcluster.db and Biomart.

conflict_prefer("select", "dplyr")
feature_data$ENSEMBL[is.na(feature_data$ENSEMBL)] <- ragenedf$ENSEMBL[match(feature_data$PROBEID,ragenedf$PROBEID)][which(is.na(feature_data$ENSEMBL))]
feature_data$SYMBOL[is.na(feature_data$SYMBOL)] <- ragenedf$SYMBOL[match(feature_data$PROBEID,ragenedf$PROBEID)][which(is.na(feature_data$SYMBOL))]
table(is.na(feature_data$ENSEMBL))
## 
## FALSE  TRUE 
## 27182 10943

Eventually, we managed to annotate something about 2/3 of our probes. Let’s look at the duplicates of PROBEID in our dataframe:

janitor::get_dupes(feature_data, PROBEID) %>% head()

We’ll get rid PROBEID duplicates:

feature_data <- dplyr::distinct(feature_data, PROBEID, .keep_all = TRUE)

And lastly we filter the norm_data by feature_data:

norm_data <- norm_data[feature_data$PROBEID, ]

if (any(feature_data$PROBEID != featureNames(norm_data)))
  stop("Feature data mismatch.")

fData(norm_data) <- feature_data
annotation(norm_data) <- "ragene21sttranscriptcluster.db"

Let’s look at the final number of our annotated genes.

table(is.na(fData(norm_data)$ENSEMBL))
## 
## FALSE  TRUE 
## 25813 10872

Exploratory analysis - biological quality control

Finally, we’ll look at our data.

For plotting the following graphs we’ll use age_library.R script:

groups <- pData(norm_data)$sample_type
plot_hc(exprs(norm_data), color_by = groups, color_by_lab = "Sample_type")

plot_pca(exprs(norm_data), sample_data = pData(norm_data), n_top_features = 1000, color_by = "sample_type", plot_type = "multi")$plot

plot_heatmap(
  exprs(norm_data),
  n_top_features = 1000,
  z_score = TRUE,
  column_annotation = dplyr::select(pData(norm_data), sample_type),
  title = "Affymetrix",
  legend_title = "z-score",
  show_row_names = FALSE
)

From the plotted graphs we can say that our data are not affected by batch effect.

Differential expression analysis (DEA)

Let’s decide which genes are differentially expressed and which not. For that we will use linear models and package limma.

We’ll create a matrix:

group <- pData(norm_data)$sample_type %>% factor()
group <- relevel(group, "control")
dea_model <- model.matrix(~ group)
colnames(dea_model)[1] <- "Intercept"
dea_model
##    Intercept groupmethylglyoxal
## 1          1                  0
## 2          1                  0
## 3          1                  0
## 4          1                  0
## 5          1                  0
## 6          1                  0
## 7          1                  1
## 8          1                  1
## 9          1                  1
## 10         1                  1
## 11         1                  1
## 12         1                  1
## 13         1                  1
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

And fit our data to it:

fit <- lmFit(norm_data, dea_model) %>% eBayes()
colnames(fit)
## [1] "Intercept"          "groupmethylglyoxal"

We can use the topTable function to look at the top differentially expressed genes (DEGs).

topTable(fit, coef = "groupmethylglyoxal")
df_top <- topTable(fit, coef = "groupmethylglyoxal")
df_top <- dplyr::filter(as.data.frame(df_top), abs(logFC) > 1 & adj.P.Val < 0.1)
df_top

However, if we look at the topTable and use threshold FDR <= 0.1 and |LFC| > 1 we obtained only 2 DEGs.

Visualization of DEG

Now, we’ll look closer at our DEGs, which are genes: “Nr1d1” and “Cish”.

Data preparation

We’ll prepare dataframe containing all the samples expression from the exprs(norm_data).

data_long <- exprs(norm_data)%>%
  as.data.frame() %>%
  tibble::rownames_to_column("PROBEID") %>%
  tidyr::pivot_longer(-PROBEID, names_to = "sample_name", values_to = "E") %>%
  dplyr::left_join(fData(norm_data), by = "PROBEID") %>%
  dplyr::left_join(pData(norm_data), by = "sample_name")

head(data_long)

We’ll select from this dataframe only our DEGs:

which(grepl("Nr1d1", data_long$SYMBOL))
##  [1] 138477 138478 138479 138480 138481 138482 138483 138484 138485 138486 138487 138488 138489
which(grepl("Cish", data_long$SYMBOL))
##  [1] 407486 407487 407488 407489 407490 407491 407492 407493 407494 407495 407496 407497 407498
data_long1<- rbind(data_long[407486:407498,], data_long[138477:138489,])

Boxplots

Now, we’ll look at the boxplots of our DEGs.

plot_boxplots(
  data_long1,
  x = "sample_type",
  y = "E",
  facet_by = "SYMBOL",
  color_by = "sample_type",
  main = "Affymetrix",
  x_lab = "Sample_Group",
  y_lab = "log2(expression intensity)",
  do_t_test = FALSE
) +
  theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank()
  )

So, as we can see Cish gene is downregulated gene and Nr1d1 is a upregulated gene be methylglyoxal.

Volcano plot

Let’s look at volcano plot to see statistically significant genes.

deg<- topTable(fit, coef = "groupmethylglyoxal",number= 10000)
threshold_OE <- deg$adj.P.Val < 0.1
deg$threshold <- threshold_OE
ggplot(deg) +
        geom_point(aes(x=logFC, y=-log10(adj.P.Val), colour=threshold)) +
        ggtitle("Volcano plot of DEG") +
        xlab("log2 fold change") + 
        ylab("-log10 adjusted p-value") +
        #scale_y_continuous(limits = c(0,50)) +
        theme(legend.position = "none",
              plot.title = element_text(size = rel(1.5), hjust = 0.5),
              axis.title = element_text(size = rel(1.25)))  

Unfortunately, we have only 2 of DEGs.

GGplot

Let’s look at their expressions:

ggplot(data_long1) +
        geom_point(aes(x = SYMBOL, y = E, color = sample_type), position=position_jitter(w=0.1,h=0)) +
        scale_y_log10() +
        xlab("Genes") +
        ylab("Expression") +
        ggtitle("Top 4 Significant DE Genes") +
        theme_bw() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    theme(plot.title=element_text(hjust=0.5))

Heatmap

Let’s prepare the data for heatmap of our DEGs:

subset_g <- dplyr::filter(feature_data, ENSEMBL %in% c("ENSRNOG00000009329","ENSRNOG00000029543"))
subset_genes<- exprs(norm_data)[subset_g$PROBEID,]
annotation <- data.frame(sampletype=pData(norm_data)[,'sample_type'], 
                     row.names=rownames(pData(norm_data)))

Let’s make a heatmap:

heat_colors <- brewer.pal(6, "YlOrRd")
pheatmap(subset_genes, color = heat_colors,border_color = NA, cluster_rows = T, show_rownames=F, fontsize = 10, scale="row",fontsize_row = 10, annotation_col = annotation, legend = FALSE,   cellwidth = 30, cellheight = 50)

Gene Set Enrichment Analysis (GSEA)

GSEA s a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with disease phenotypes. The method uses statistical approaches to identify significantly enriched or depleted groups of genes.[Source]

However, from our data we obtained only 2 DEGs, which seems to be low to run GSEA. Well, at least we’ll try and see what the results will be.

Let’s prepare our dataframe with genes statistics from topTable.

res_dex_shrink <- topTable(fit, coef = "groupmethylglyoxal",number = 36685)
res_dex_shrink

Let’s filter the dataframe and get rid of duplicates.

res_dex_shrink <- res_dex_shrink %>% drop_na()
conflict_prefer("select", "dplyr")
res_dex_shrink<- dplyr::distinct(res_dex_shrink, ENSEMBL, .keep_all = TRUE)
res_dex_shrink<- dplyr::distinct(res_dex_shrink, PROBEID, .keep_all = TRUE)
res_dex_shrink<- dplyr::distinct(res_dex_shrink, ENTREZID, .keep_all = TRUE)
res_dex_shrink

Prepare the vector of LFC.

entrez_lfc <- res_dex_shrink$logFC
names(entrez_lfc) <- res_dex_shrink$ENTREZID
entrez_lfc <- entrez_lfc[order(entrez_lfc, decreasing = TRUE)]

Let’s create a named vector ranked based on the t-statistic values. We need to sort the t-statistic values in descending order here.

t_vector <- res_dex_shrink$t
names(t_vector) <- res_dex_shrink$ENTREZID
t_vector <- sort(t_vector, decreasing = TRUE)
head(t_vector)
##   252917   303836   297822   117054    24440   681086 
## 8.708741 5.819923 5.148367 4.843744 4.790586 4.574141

Run GSEA on KEGG pathways:

gsea_kegg_results <- gseKEGG(
  geneList = t_vector,
  # KEGG organism ID
  organism = "rno",
  # Key type is ENTREZ ID.
  keyType = "ncbi-geneid",
  # Correct p-values for FDR.
  pAdjustMethod = "fdr",
  # FDR adjusted p-value threshold.
  # We are OK with 10% of false positives among all pathways called significant.
  pvalueCutoff = 0.1,
  # Set a constant seed so you will get reproducible results using the same data.
  seed = 1,
  verbose = TRUE
)

See the results below:

as.data.frame(gsea_kegg_results)

We will convert ENTREZIDs to gene symbols using our rat gene database:

gsea_kegg_results <- setReadable(gsea_kegg_results, "ragene21sttranscriptcluster.db", keyType = "ENTREZID")
gsea_kegg_results_filtered <- dplyr::filter(gsea_kegg_results, p.adjust < 0.05)

Finally, we’ll plot the results:

GSEA plot

library(enrichplot)
gseaplot2(gsea_kegg_results, geneSetID = 1:3, pvalue_table = TRUE, ES_geom = "dot")

Dotplot

conflict_prefer("dotplot", "enrichplot") 
dotplot(gsea_kegg_results, showCategory = 15, x = "GeneRatio", font.size = 10)

Gene-Concept Network

p <- cnetplot(gsea_kegg_results, showCategory = 3, foldChange = entrez_lfc, colorEdge = TRUE)
p_file <- here("gsea_cnetplot.png")
ggsave(p_file, p, device = "png", width = 10, height = 10)

Gene-Concept Network in heatmap

p <- heatplot(gsea_kegg_results, foldChange = entrez_lfc, showCategory = 3) +
  theme(axis.text.x = element_text(angle = 90, vjust = 2, size = 7))
p_file <- here("gsea_heatmap.png")
ggsave(p_file, p, device = "png", width = 15, height = 5)

Enrichment map

gsea_kegg_results <- pairwise_termsim(gsea_kegg_results)
emapplot(gsea_kegg_results, color = "NES", showCategory = 20)

PubMed Central plot

terms <- gsea_kegg_results$Description[1:3]
p <- pmcplot(terms, 2010:2017)
p2 <- pmcplot(terms, 2010:2017, proportion = FALSE)
patchwork::wrap_plots(p, p2, ncol = 2, guides = "collect")

Signaling pathway impact analysis (SPIA)

Now we can use our results from GSEA to run SPIA. SPIA calculates: Over-representation (ORA) of DEGs in the pathway and perturbation (PERT): the influence of observed expression changes on pathway output.

So, let’s try i out.

KEGG_DATA_DIR <- here("kegg_data")
dir.create(KEGG_DATA_DIR)
kegg_ids <- gsea_kegg_results@result$ID[1:5]
purrr::map(kegg_ids, ~ download.file(glue("http://rest.kegg.jp/get/{.}/kgml"), glue("{KEGG_DATA_DIR}/{.}.xml")))
## [[1]]
## [1] 0
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 0
## 
## [[4]]
## [1] 0
## 
## [[5]]
## [1] 0
library(SPIA)

makeSPIAdata(
  kgml.path = KEGG_DATA_DIR,
  organism = "rno",
  out.path = KEGG_DATA_DIR
)
## [1] TRUE

Let’s set our thresholds and prepare gene statistics for SPIA.

PADJ_THRESHOLD <- 0.1
LFC_THRESHOLD <- 1

res_dex_shrink_ann <- topTable(fit, coef = "groupmethylglyoxal", number= 1000)
res_dex_shrink_ann <- res_dex_shrink_ann %>% drop_na()
res_dex_shrink_ann <- dplyr::distinct(res_dex_shrink_ann, ENSEMBL, .keep_all = TRUE)

deg_filter <- (!is.na(res_dex_shrink_ann$adj.P.Val)) & (res_dex_shrink_ann$adj.P.Val <= PADJ_THRESHOLD) & (abs(res_dex_shrink_ann$logFC) >= LFC_THRESHOLD)

deg_entrezids <- res_dex_shrink[deg_filter, "ENTREZID"]

Vector of LFCs of DEGs. Names are ENTREZIDs.

entrez_lfc_deg <- entrez_lfc[deg_entrezids]
entrez_lfc_deg <- entrez_lfc_deg[order(entrez_lfc_deg, decreasing = TRUE)]
spia_results <- spia(
  de = entrez_lfc_deg,
  all = res_dex_shrink$ENTREZID,
  organism = "rno",
  data.dir = paste0(KEGG_DATA_DIR, "/")
)
## 
## Done pathway 1 : Protein processing in endoplas..
## Done pathway 2 : Malaria..
## Done pathway 3 : Herpes simplex virus 1 infecti..
spia_results
plotP(spia_results, threshold = 0.3)

Report

Now, we’ll create the report for our DEGs using threshold FDR <= 0.1 and |LFC| > 1 and signpost by running script: report.R

report()
## 
## 
## DEA for group 'groupmethylglyoxal': 
  |                                                                                                        
  |                                                                                                  |   0%
  |                                                                                                        
  |.................................................                                                 |  50%
##    inline R code fragments
## 
## 
  |                                                                                                        
  |..................................................................................................| 100%
## label: unnamed-chunk-65 (with options) 
## List of 1
##  $ echo: logi FALSE
## 
## 
## /usr/lib/rstudio-server/bin/pandoc/pandoc +RTS -K512m -RTS dea_table_template.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output groupmethylglyoxal.html --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /usr/local/lib/R/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --include-in-header /tmp/RtmpK8G9zK/rmarkdown-str6c3d5a6b5667.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' 
## 
  |                                                                                                        
  |                                                                                                  |   0%
  |                                                                                                        
  |.................................                                                                 |  33%
##   ordinary text without R code
## 
## 
  |                                                                                                        
  |.................................................................                                 |  67%
## label: unnamed-chunk-1-2 (with options) 
## List of 1
##  $ echo: logi FALSE
## 
## 
  |                                                                                                        
  |..................................................................................................| 100%
##   ordinary text without R code
## 
## 
## /usr/lib/rstudio-server/bin/pandoc/pandoc +RTS -K512m -RTS dea_signpost.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output dea_signpost.html --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --template /usr/local/lib/R/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --include-in-header /tmp/RtmpK8G9zK/rmarkdown-str6c3d2e06f449.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'

Conclusion

In this analysis, we take a closer look on microarray data obtained from the experiment studying the effect of methylglyoxal on transcriptome changes in visceral adipose tissue in rats. Firstly, we carried out technical quality control to estimate if the experiment was done properly. By the results of QG we can say that the experiment was executed properly and there is no technical problem in our data.

Secondly, we had to annotate the probeIDs to geneIDs. In this part we dealt with an issue of having low number of annotated probeIDs, so we had to test multiple databases to get a reasonable number of annotated probeIDs. Furthermore, we performed exploratory analysis and found out that there is no batch effect in our dataset.

According to the results, Nr1d1 and Cish were the only one significantly affected genes in MG-treated rats. Based on the results Nr1d1 gene also known as “Nuclear Receptor Subfamily 1 Group D Member 1” was upregulated gene whereas Cish gene called “Cytokine Inducible SH2 Containing Protein” was downregulated gene.

We also run GSEA to detect the affected pathways. However, we had just 2 DEGs to work with, so the result from GSEA are not really relatable and can’t be used. These data were also analysed in this [paper]. According to their results they identified 78 differentially expressed genes, with 51 upregulated and 27 downregulated genes. The top upregulated genes were Nr1d1 and regulator of G-protein signaling 2: Rgs2. The top genes downregulated were Cish and insulin receptor substrate 1: Irs1. The most affected pathway was SAPK/JNK signaling pathway. Comparing the results we can say that the top DEGs corresponds, however in our analysis we did not end up with such a large number of DEGs and we did not estimate the affected pathways.

To sum up, we carried out a microarray expression data analysis where we estimated the most differently expressed genes. Those genes were Nr1d1 and Cish genes. We also compared the results with the published study and find out that our results correlate with the results of the published study.